Improved Nearest Neighbor Methods For Text Classification

نویسندگان

Güneş Erkan

Ahmed Hassan

Qian Diao

Dragomir Radev

چکیده

We present new nearest neighbor methods for text classification and an evaluation of these methods against the existing nearest neighbor methods as well as other well-known text classification algorithms. Inspired by the language modeling approach to information retrieval, we show improvements in k-nearest neighbor (kNN) classification by replacing the classical cosine similarity with a KL divergence based similarity measure. We also present an extension of kNN to the semi-supervised case which turns out to be a formulation that is equivalent to semi-supervised learning with harmonic functions. In both supervised and semi-supervised experiments, our algorithms surpass traditional nearest neighbor methods and produce competitive results when compared to the state-of-the-art methods such as Support Vector Machines (SVM) and transductive SVM on the Reuters-21578 dataset, the 20 Newsgroups dataset, and the Reuters Corpus Volume I (RCV1) dataset. To our knowledge, this paper presents one of the most comprehensive evaluation of different machine learning algorithms on the entire RCV1 dataset.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

متن کامل

Improved Nearest Neighbor Methods For Text Classification With Language Modeling and Harmonic Functions

متن کامل

Integrating Background Knowledge into Nearest-Neighbor Text Classification

This paper describes two different approaches for incorporating background knowledgeinto nearest-neighbor text classification.Our first approachuses backgroundtext to assessthe similarity betweentraining and test documentsrather than assessing their similarity directly. The second method redescribes examples using Latent Semantic Indexing on the background knowledge, assessing document similari...

متن کامل

Improved Nearest Neighbor Based Approach to Accurate Document Skew Estimation

The nearest-neighbor based document skew detection methods do not require the presence of a predominant text area, and are not subject to skew angle limitation. However, the accuracy of these methods is not perfect in general. In this paper, we present an improved nearest-neighbor based approach to perform accurate document skew estimation. Size restriction is introduced to the detection of nea...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Improved Nearest Neighbor Methods For Text Classification

نویسندگان

چکیده

منابع مشابه

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

Improved Nearest Neighbor Methods For Text Classification With Language Modeling and Harmonic Functions

Integrating Background Knowledge into Nearest-Neighbor Text Classification

Improved Nearest Neighbor Based Approach to Accurate Document Skew Estimation

عنوان ژورنال:

اشتراک گذاری